Robust Real-time and Rotation-invariant American Sign Language Alphabet Recognition Using Range Camera
نویسنده
چکیده
The automatic interpretation of human gestures can be used for a natural interaction with computers without the use of mechanical devices such as keyboards and mice. The recognition of hand postures have been studied for many years. However, most of the literature in this area has considered 2D images which cannot provide a full description of the hand gestures. In addition, a rotation-invariant identification remains an unsolved problem even with the use of 2D images. The objective of the current study is to design a rotation-invariant recognition process while using a 3D signature for classifying hand postures. An heuristic and voxelbased signature has been designed and implemented. The tracking of the hand motion is achieved with the Kalman filter. A unique training image per posture is used in the supervised classification. The designed recognition process and the tracking procedure have been successfully evaluated. This study has demonstrated the efficiency of the proposed rotation invariant 3D hand posture signature which leads to 98.24% recognition rate after testing 12723 samples of 12 gestures taken from the alphabet of the American Sign Language. * Corresponding author. 1. OBJECTIVE AND RELATED WORK The objective of the current study is to design a system where the alphabet of the American Sign Language can be recognized in real-time. Contrary to existing methods, the current one allows hand posture recognition independently of the orientation of the user’s hand. It makes use of a 3D signature, considers only one training image per posture and uses a significant number of testing images for its evaluation. A lot of research has been conducted on this topic but most of it doesn’t consider using the advantage that a 3D signature can provide. For example, in Yunli and Keechul (2007), after generating a point cloud of a hand posture from data captured with four web cameras, the authors use cylindrical virtual boundaries to randomly extract five slices of the point cloud. Each slice is processed by analyzing the point cloud distribution and the hand posture is recognized from this analysis. By doing so, though the hand postures are represented by a 3D point cloud, the full 3D topology is not considered in the recognition process. Other researchers, though using a 3D sensor, do not consider at all the third dimension in the features used to represent the hand postures. That is the case of Guan-Feng et al., (2001) where the authors use a 3D depth camera but only consider the 2D outline of the hand segment in their recognition process. The design of a rotation invariant system has not been successfully achieved so far. Indeed many researchers consider the principal component analysis to evaluate the orientation of the 2D hand image but, as acknowledged by Uebersax et al. (2012), this method is not always accurate. Not only has the estimation of the rotation of a 2D hand segment not been successful so far but, furthermore the evaluation of the orientation of a 3D hand segment is not considered in most of existing approaches. To test their hand motion classification using a multi-channel surface electromyography sensor, Xueyan et al. (2012) only consider five testing images per gesture. Contrary to most of the studies on this topic, a significant number of testing samples has been considered to validate the proposed algorithm. Indeed, testing 1000 images instead of 5 provides more evidence on the robustness of the methodology. The proposed method considers only one single training image with the objective of showing the robustness of the method and also its appropriateness for a real-time application. To track the hand motion during the real-time process the Kalman filter has been proposed with a detailed explanation on how the process noise and the measurement noise have been modelled. In order to achieve these objectives, the sensor considered is the SR4000 range camera because of its ability to provide 3D images at video rates. For further details, please refer to (Lange, 2001) who provide an exhaustive explanation on the SR4000’s principles. This paper is structured as follows: Section 2 describes the set up of the experiment and section 3, the methodology for tracking the hand motion and its evaluation. In section 4, the recognition principle is depicted. The rotation invariance algorithm is highlighted in section 5. The experimental results International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B5, 2012 XXII ISPRS Congress, 25 August – 01 September 2012, Melbourne, Australia
منابع مشابه
Towards Real-Time and Rotation-Invariant American Sign Language Alphabet Recognition Using a Range Camera
The automatic interpretation of human gestures can be used for a natural interaction with computers while getting rid of mechanical devices such as keyboards and mice. In order to achieve this objective, the recognition of hand postures has been studied for many years. However, most of the literature in this area has considered 2D images which cannot provide a full description of the hand gestu...
متن کاملVision Based Hand Gesture Recognition Using Fourier Descriptor for Indian Sign Language
Indian Sign Language (ISL) interpretation is the major research work going on to aid Indian deaf and dumb people. Considering the limitation of glove/sensor based approach, vision based approach was considered for ISL interpretation system. Among different human modalities, hand is the primarily used modality to any sign language interpretation system so, hand gesture was used for recognition o...
متن کاملRobust Sign Language Recognition System Using ToF Depth Cameras
Sign language recognition is a difficult task, yet required for many applications in real-time speed. Using RGB cameras for recognition of sign languages is not very successful in practical situations and accurate 3D imaging requires expensive and complex instruments. With introduction of Time-of-Flight (ToF) depth cameras in recent years, it has become easier to scan the environment for accura...
متن کاملA New Approach For Hand Gestures Recognition Based on Depth Map Captured by RGB-D Camera
This paper introduces a new approach for hand gesture recognition based on depth Map captured by an RGB-D Kinect camera. Although this camera provides two types of information ”Depth Map” and ”RGB Image”, only the depth data information is used to analyze and recognize the hand gestures. Given the complexity of this task, a new method based on edge detection is proposed to eliminate the noise a...
متن کاملReal-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video
We present two real-time hidden Markov model-based systems for recognizing sentence-level continuous American Sign Language (ASL) using a single camera to track the user’s unadorned hands. The first system observes the user from a desk mounted camera and achieves 92% word accuracy. The second system mounts the camera in a cap worn by the user and achieves 98% accuracy (97% with an unrestricted ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012